Modeling a neuronal controller exhibiting human postural sway

ABSTRACT

Conventionally, a neuronal controller located inside the central nervous system governing the maintenance of the upright posture of the human body is designed from a control system perspective using proportional-integral-derivative (PID) controllers, wherein human postural sway is modeled either along a sagittal plan or along a frontal plane separately resulting in limited insights on intricacies of a governing neuronal controller. Also, existing neuronal controllers using a reinforcement learning (RL) paradigm are based on complex actor-critic on-policy algorithms. Analyzing human postural sway is critical to detect markers for progression of balance impairments. The present disclosure facilitates modelling the neuronal controller using a simplified RL algorithm, capable of producing postural sway characteristics in both sagittal and frontal plane together. The Q-learning technique of the RL paradigm is employed for learning an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model.

PRIORITY CLAIM

This U.S. patent application claims priority under 35 U.S.C. § 119 to: Indian Patent Application No. 201921012442, filed on 29 Mar. 2019. The entire contents of the aforementioned application are incorporated herein by reference.

TECHNICAL FIELD

The disclosure herein generally relates to postural sway analyses, and, more particularly, to systems and computer implemented methods for modeling a neuronal controller exhibiting human postural sway.

BACKGROUND

Analyzing human postural sway is critical to detect markers for progression of balance impairments. Typically, balance impairments are caused by altered functions of the central nervous system or sensory or motor functions which may be due to age or pathology. Parkinson's disease and peripheral neuropathy may be deciphered by analyzing the postural sway. Conventionally, a neuronal controller located inside the central nervous system governing the maintenance of the upright posture of the human body is designed from a control system perspective using one or more proportional-integral-derivative (PID) controllers, wherein human postural sway is modeled either along a sagittal plan or along a frontal plane separately resulting in limited insights on intricacies of a governing neuronal controller. Some attempts have also been made to model the neuronal controller using a reinforcement learning (RL) paradigm based on complex actor-critic on-policy algorithms.

SUMMARY

Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.

In an aspect, there is provided a processor implemented method for modeling a neuronal controller exhibiting human postural sway comprising the steps of: modeling, by one or more hardware processors, the neuronal controller in the form of a Reinforcement Learning (RL) agent based on an inverted pendulum with 1 Degree Of Freedom (1 DOF) representing a first mechanical model of a human body in the form of a dynamical system, wherein the RL agent is trained using a Q-learning technique to learn an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model, wherein the modeled neuronal controller is configured to reproduce Center Of Pressure (COP) characteristics in the human body either along a sagittal plane or along a frontal plane separately; deriving, by the one or more hardware processors, dynamical equations of a Spherical Inverted Pendulum (SIP) with respect to a global coordinate system, wherein the SIP represents a second mechanical model of the human body that exhibits postural sway along both the frontal plane and the sagittal plane together, wherein the dynamical equations are derived by using Lagrange's equations with two independent state variables (θx and θy) being angular deviation of the SIP about a pivot joint and along x and y axes respectively of the global coordinate system, wherein the pivot joint characterizes an ankle joint of the human body; and modeling, by the one or more hardware processors, the human postural sway both along the sagittal plane and along the frontal plane together using the modeled neuronal controller and the derived dynamical equations of the SIP by tuning (i) a reward function comprised in the modeled neuronal controller and (ii) a set of parameters to balance the SIP such that the postural sway characteristics generated by the modeled neuronal controller match the postural sway characteristics of one or more control subjects, wherein the set of parameters include parameters of the MDP model and parameters associated with physiology of the human body.

In another aspect, there is provided a system for modeling a neuronal controller exhibiting human postural sway, the system comprising: one or more data storage devices operatively coupled to one or more hardware processors and configured to store instructions configured for execution by the one or more hardware processors to: model the neuronal controller in the form of a Reinforcement Learning (RL) agent based on an inverted pendulum with 1 Degree Of Freedom (1 DOF) representing a first mechanical model of a human body in the form of a dynamical system, wherein the RL agent is trained using a Q-learning technique to learn an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model, wherein the modeled neuronal controller is configured to reproduce Center Of Pressure (COP) characteristics in the human body either along a sagittal plane or along a frontal plane separately; derive dynamical equations of a Spherical Inverted Pendulum (SIP) with respect to a global coordinate system, wherein the SIP represents a second mechanical model of the human body that exhibits postural sway along both the frontal plane and the sagittal plane together, wherein the dynamical equations are derived by using Lagrange's equations with two independent state variables (θx and θy) being angular deviation of the SIP about a pivot joint and along x and y axes respectively of the global coordinate system, wherein the pivot joint characterizes an ankle joint of the human body; and model the human postural sway both along the sagittal plane and along the frontal plane together using the modeled neuronal controller and the derived dynamical equations of the SIP by tuning (i) a reward function comprised in the modeled neuronal controller and (ii) a set of parameters to balance the SIP such that the postural sway characteristics generated by the modeled neuronal controller match the postural sway characteristics of one or more control subjects, wherein the set of parameters include parameters of the MDP model and parameters associated with physiology of the human body.

In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: model the neuronal controller in the form of a Reinforcement Learning (RL) agent based on an inverted pendulum with 1 Degree Of Freedom (1 DOF) representing a first mechanical model of a human body in the form of a dynamical system, wherein the RL agent is trained using a Q-learning technique to learn an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model, wherein the modeled neuronal controller is configured to reproduce Center Of Pressure (COP) characteristics in the human body either along a sagittal plane or along a frontal plane separately; derive dynamical equations of a Spherical Inverted Pendulum (SIP) with respect to a global coordinate system, wherein the SIP represents a second mechanical model of the human body that exhibits postural sway along both the frontal plane and the sagittal plane together, wherein the dynamical equations are derived by using Lagrange's equations with two independent state variables (θx and θy) being angular deviation of the SIP about a pivot joint and along x and y axes respectively of the global coordinate system, wherein the pivot joint characterizes an ankle joint of the human body; and model the human postural sway both along the sagittal plane and along the frontal plane together using the modeled neuronal controller and the derived dynamical equations of the SIP by tuning (i) a reward function comprised in the modeled neuronal controller and (ii) a set of parameters to balance the SIP such that the postural sway characteristics generated by the modeled neuronal controller match the postural sway characteristics of one or more control subjects, wherein the set of parameters include parameters of the MDP model and parameters associated with physiology of the human body.

In accordance with an embodiment of the present disclosure, the Q-learning technique is configured to learn to generate a torque representing an action for each state of the inverted pendulum by: receiving proprioceptive inputs in the form of angle and angular velocity of the pivot joint of the inverted pendulum; and generating the torque at the pivot joint of the inverted pendulum based on the received proprioceptive inputs to maintain an upright posture of the inverted pendulum.

In accordance with an embodiment of the present disclosure, the one or more hardware processors are configured to perform the step of modeling a neuronal controller by reproducing the Center Of Pressure (COP) characteristics in the human body either along the sagittal or along the frontal plane separately in accordance with a dynamic equation based on overall mass m of the human body being concentrated at the point of Center Of Mass (COM) located at the 2^(nd) lumbar vertebrae of the human body, an approximate height L of the 2^(nd) lumbar vertebrae, moment of inertia of the inverted pendulum I, angle of the inverted pendulum θ with respect to the direction of gravity, gravitational acceleration g, a stiffness constant K and a damping constant B, wherein the stiffness constant and the damping constant denote pivot joint properties of the inverted pendulum.

In accordance with an embodiment of the present disclosure, the parameters of the MDP model include: n_(θ) representing resolution of states in θ domain (from θ_(max) to −θ_(max)); n_({dot over (θ)}) representing resolution of states in {dot over (θ)} domain (from {dot over (θ)}_(max) to −{dot over (θ)}_(max)); n_(A) representing resolution of states in τ domain (from τ_(max) to −τ_(max)); τ_(max) representing the maximum torque exertable on the pivot joint or the boundary of the τ domain; θ_(max) representing the boundary of the θ domain; {dot over (θ)}_(max) representing a finite boundary of the {dot over (θ)} domain; p_(a) representing property of a curve from which torque values in the τ domain is sampled; p_(θ) representing property of a curve from which boundaries of states of the θ domain is sampled; and p_({dot over (θ)}) representing property of a curve from which boundaries of the states of the {dot over (θ)} domain is sampled; and the parameters associated with physiology of the human body include: White Gaussian Noise (WGN) added to the generated torque to represent a real world noisy biological system; filtering factor (λ) added to portray signaling characteristics of a neuromuscular junction; and Scaling Factor (SF) introduced to scale down the magnitude of the generated torque for each state of the inverted pendulum after the training of the RL agent is completed, wherein completion is of the training is represented by a balanced SIP.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.

FIG. 1 illustrates an exemplary block diagram of a system for modeling a neuronal controller exhibiting human postural sway, in accordance with an embodiment of the present disclosure.

FIG. 2 illustrates a high level architecture enabling a method for modeling a neuronal controller exhibiting human postural sway, in accordance with an embodiment of the present disclosure.

FIG. 3 illustrates an exemplary flow diagram of a computer implemented method for modeling a neuronal controller exhibiting human postural sway, in accordance with an embodiment of the present disclosure.

FIG. 4 illustrates a 1 Degree Of Freedom (1 DOF) inverted pendulum representing a free body diagram of a simplified human body in the sagittal plane, in accordance with an embodiment of the present disclosure.

FIG. 5 illustrates a Spherical Inverted Pendulum (SIP) having 2 DOF with two state variables, θx and θy, in accordance with an embodiment of the present disclosure.

FIG. 6A and FIG. 6B illustrate Center Of Pressure (COP) characteristics along x axis and a corresponding frequency spectrum respectively; and FIG. 6C and FIG. 6D illustrate the COP characteristics along y axis and a corresponding frequency spectrum respectively, for a set of parameters, without introducing parameters associated with physiology of the human body, in accordance with an embodiment of the present disclosure.

FIG. 7A and FIG. 7B illustrate variance of the angle of the SIP and the angular velocity of the SIP with respect to the x axis and the y axis respectively (θ_(x), θ_(y), {dot over (θ)}_(x), {dot over (θ)}_(y)) w.r.t time; FIG. 7C illustrates a plot of applied torques τ_(x), τ_(y) at a pivot joint by the neuronal controller and FIG. 7D illustrates COP movement along the x axis and the y axis for the set of parameters referred in FIG. 6A through FIG. 6D.

FIG. 8A and FIG. 8B illustrate the COP characteristics along the x axis and a corresponding frequency spectrum respectively.

FIG. 8C and FIG. 8D illustrate the COP characteristics along the y axis and a corresponding frequency spectrum respectively, after introducing parameters associated with physiology of the human body, in accordance with an embodiment of the present disclosure.

FIG. 9A and FIG. 9B illustrate the variance of the angle of the SIP and the angular velocity of the SIP with respect to the x axis and the y axis respectively (θ_(x), θ_(y), {dot over (θ)}_(x), {dot over (θ)}_(y)) w.r.t time; FIG. 9C illustrates a plot of applied torques τ_(x), τ_(y) at a pivot joint by the neuronal controller and FIG. 9D illustrates COP movement along the x axis and the y axis for the parameters used with reference to FIG. 8A through FIG. 8D.

DETAILED DESCRIPTION

Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.

State of the art techniques for analyzing human postural sway, model the human postural sway either along a sagittal plane or along a frontal plane separately. Modeling separately along the anatomical planes only give lower dimensional projection of higher dimensional dynamics which in turn compromises the overall picture portraying the human postural sway. Also, the attempts were directed to designing the neuronal controller located inside the central nervous system governing the maintenance of the upright posture of the human body from a control system perspective using PID controllers. Using PID controller only gives rise to limited set of parameters with very less neurophysiological significance. Some attempts to model the neuronal controller using Reinforcement Learning (RL) paradigm were actor-critic on-policy algorithms that address optimization of two redundant parallel entities viz., the ‘policy’ and the ‘actor’ which was relatively complex and computationally inefficient.

The methods and systems of the present disclosure firstly facilitate designing the neuronal controller using a simplified RL algorithm capable of producing diverse sway characteristics either in the sagittal plane or in the front plane separately using a simple inverted pendulum representing a mechanical model of a human body. Dynamical equations of a Spherical Inverted Pendulum (SIP) are derived with respect to a global coordinate system. The modeled neuronal controller is then adapted to exhibit diverse sway characteristics in the sagittal plane and in the front plane together. The present disclosure also provides parameters which enable the sway characteristics exhibited by the modeled neuronal controller to be close to normal and pathological conditions seen in the real world. Systems and methods of the present disclosure find application in analyzing the postural balance of a subject so that subject specific rehabilitation protocols may be recommended.

Referring now to the drawings, and more particularly to FIG. 1 through FIG. 9D, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and/or method.

FIG. 1 illustrates an exemplary block diagram of a system 100 for modeling a neuronal controller exhibiting human postural sway, in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more hardware processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In the context of the present disclosure, the expressions ‘processors’ and ‘hardware processors’ may be used interchangeably. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.

The I/O interface(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface(s) can include one or more ports for connecting a number of devices to one another or to another server.

The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.

FIG. 2 illustrates a high level architecture enabling a method for modeling a neuronal controller exhibiting human postural sway, in accordance with an embodiment of the present disclosure. FIG. 3 illustrates an exemplary flow diagram of a computer implemented method 300 for modeling a neuronal controller exhibiting human postural sway, in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more data storage devices or memory 102 operatively coupled to the one or more processors 104 and is configured to store instructions configured for execution of steps of the method 300 by the one or more processors 104. The steps of the method 300 will now be explained in detail with reference to the components of the system 100 of FIG. 1 and the architecture of FIG. 2 . Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.

In accordance with an embodiment of the present disclosure, the one or more processors 104 are configured to model, at step 302, the neuronal controller in the form of a Reinforcement Learning (RL) agent based on an inverted pendulum with 1 Degree of Freedom (1 DOF) representing a first mechanical model of a human body in the form of a dynamical system. In an embodiment, the RL agent is trained using a Q-learning technique to learn an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model, wherein the modeled neuronal controller is configured to reproduce Center Of Pressure (COP) characteristics in the human body either along the sagittal plane or along the frontal plane separately.

The modeling of the MDP for the RL agent, in accordance with an embodiment of the present disclosure is performed as described herein after. Markov decision process contains the elements [S, A, P(ś|s, a), E[r|s, á, s]γ], wherein S represents a set of states, A represents a set of actions, P(ś|s, a) represents a transition probability, γ represents a discount factor and E[r|s, á, s] represents an expected reward for a current state s, immediate action a and next state ś. The model based algorithm has no stochasticity associated with state transition. Accordingly, given a particular state s (in terms of θ and {dot over (θ)}) and action a (in terms of torque at a pivot joint), system dynamics reveal an immediate next state. A reward policy or a reward function (used interchangeably in the context of the present disclosure) is also modeled deterministically. The set of states S, the set of actions A and the reward policy are modeled as described hereinafter.

Modeling the set of actions A: The actions are sampled using a curve represented as:

$\begin{matrix} {y = {{\tau_{\max}a{❘x❘}^{p}{for}x} \geq 0}} & (1) \end{matrix}$ $\begin{matrix} {y = {{{- \tau_{\max}}a{❘x❘}^{p}{for}x} < 0}} &  \end{matrix}$ $\begin{matrix} {{a = \frac{1}{\left( \frac{n_{A} - 1}{2} \right)^{p}}},} &  \end{matrix}$ where n_(A) represents number of actions, p represents an exponent that defines the characteristics of a curve represented in equation 1. and τ_(max) represents a maximum torque applied to the pivot joint. Let set, X_(A)={all integers|−(n_(A)−1)/2≤n≤(n_(A)−1)/2}, where x_(a)∈X_(A) and n denotes an integer number. To obtain, n_(A) number of actions, the curve represented by equation (1) is sampled for all integer values in the range −(n_(A)−1)/2≤n≤(n_(A)−1)/2 for all elements of the set X_(A) or A={all y from equation (1)|x∈X_(A)}.

Modeling the set of States (S): The system state comprises of two state variables θ and {dot over (θ)}. The scope of the state variable θ spans from −θ_(max) to θ_(max) and is represented as θ_(span). Since the present disclosure deals with discrete MDP, the θ and {dot over (θ)} space is discretized by defining boundaries. Boundaries of ‘sub-states’ in θ space are sampled from a curve represented by equation (2) below.

$\begin{matrix} {{y = {\theta_{\max}a{❘x❘}^{p}{for}}},{x \geq 0}} & (2) \end{matrix}$ $\begin{matrix} {{{y = {{- \theta_{\max}}a{❘x❘}^{p}{for}}},{x < 0}}{{a = \frac{1}{\left( \frac{n_{\theta} + 1}{2} \right)^{p}}},}} &  \end{matrix}$ where n_(θ) represents number of sub-states in the θ space inside θ_(span) and wherein n_(θ) is chosen to be an odd integer. If a set

${X_{\theta} = \left\{ {{{all}{integers}}❘{{- \frac{n_{\theta} + 1}{2}} \leq n \leq {\frac{n_{\theta} + 1}{2}{and}x_{\theta}} \neq 0}} \right\}},$ where x_(θ)∈X_(θ), n_(θ)+1 number of boundaries required to define n_(θ) number of states, may be samples from equation (2) for all elements of the set X_(θ). The set containing these boundaries may be represented as given below. θ_(SB)={all y from equation (2)|x∈X_(θ)} or, θ_(SB) _(i) ∈θ_(SB) where, i∈X_(θ)

On similar lines, scope of the state variable {dot over (θ)} spans from −∞ to ∞. Sub-state −1 spans from −∞ to −{dot over (θ)}_(max) and state −n_({dot over (θ)}) spans from {dot over (θ)}_(max) to ∞, where n_({dot over (θ)}) represents number of sub-states in the {dot over (θ)} domain. From −{dot over (θ)}_(max) to {dot over (θ)}_(max), there are n_({dot over (θ)})−2 states, boundaries of which are defined by taking n_({dot over (θ)})−1 samples of all elements of a set

${X_{\overset{.}{\theta}} = \left\{ {{{all}{integers}}❘{{- \frac{n_{\overset{.}{\theta}} - 1}{2}} \leq x_{\overset{.}{\theta}} \leq {\frac{n_{\overset{.}{\theta}} - 1}{2}{and}x_{\overset{.}{\theta}}} \neq 0}} \right\}},$ where x_({dot over (θ)})∈X_({dot over (θ)}) from a curve represented by equation (3) below.

$\begin{matrix} {{y = {{\overset{.}{\theta}}_{\max}a{❘x❘}^{p}{for}}},{x \geq 0}} & (3) \end{matrix}$ $\begin{matrix} {{{y = {{- {\overset{.}{\theta}}_{\max}}a{❘x❘}^{p}{for}}},{x < 0}}{{a = \frac{1}{\left( \frac{n_{\overset{.}{\theta}} - 1}{2} \right)^{p}}},}} &  \end{matrix}$ where n_({dot over (θ)}) represents number of sub-states in the {dot over (θ)} space inside θ_(span).

Also, the set containing these boundaries may be represented as given below.

{dot over (θ)}_(SB)={all y from equation (3)|x∈X_({dot over (θ)})}

or, {dot over (θ)}_(SB) _(i) ∈{dot over (θ)}_(SB) where, i∈X_({dot over (θ)})

If S_(θ) is a set of sub-states corresponding to the state variable θ, then,

${S_{\theta} = \left\{ S_{\theta_{i}} \middle| {i \in \left\{ {{all}\ {intege}{rs}} \middle| {{- \frac{n_{\theta} - 1}{2}} \leq n \leq \frac{n_{\theta} - 1}{2}} \right\}} \right\}},$ wherein the i^(th) element of the set spans the θ space as shown below. θ_(SB) _(i−1) ≤θ<θ_(SB) _(i) for i<0; θ_(SB) _(i−1) ≤θ<θ_(SB) _(i+1) for i=0; θ_(SB) _(i) ≤θ<θ_(SB) _(i+1) for i>0; Similarly, the set corresponding to the state variable θ,

${S_{\overset{.}{\theta}} = \left\{ S_{{\overset{.}{\theta}}_{i}} \middle| {i \in \left\{ {{all}\ {integers}} \middle| {{- \frac{n_{\overset{.}{\theta}} - 1}{2}} \leq n \leq \frac{n_{\overset{.}{\theta}} - 1}{2}} \right\}} \right\}},$

And the i^(th) element of the set spans the {dot over (θ)} space as shown below.

${{{\overset{.}{\theta} < {{\overset{.}{\theta}}_{SB_{i}}{for}i}} = {- \frac{n_{\overset{.}{\theta}} - 1}{2}}};}{{{\overset{.}{\theta} > {{\overset{.}{\theta}}_{SB_{i}}{for}i}} = \frac{n_{\overset{.}{\theta}} - 1}{2}};}{{\overset{.}{\theta}}_{SB_{i - 1}} \leq \overset{.}{\theta} < {{{\overset{.}{\theta}}_{SB_{i}}{for}} - \frac{n_{\overset{.}{\theta}} - 1}{2}} < i < 0}{{{\overset{.}{\theta}}_{SB_{i - 1}} \leq \overset{.}{\theta} < {{\overset{.}{\theta}}_{SB_{i + 1}}{for}i}} = 0}{{\overset{.}{\theta}}_{SB_{i}} \leq \overset{.}{\theta} < {{\overset{.}{\theta}}_{SB_{i + 1}}{for}0} < i < \frac{n_{\overset{.}{\theta}} - 1}{2}}$ The elements of the set S is a combination of elements of the sets S_(θ) and S_({dot over (θ)}) respectively.

Modeling the Reward (E[r|ś,a,s]): In accordance with the present disclosure, the reward policy or the reward function may be modeled in multiple approaches as discussed hereinafter.

Reward policy-1: The RL agent gets a reward of +1 when ś (state of the system at t+1) falls inside S_(θ) ₀ (which spans between two nearest θ boundaries on either side of θ=0 as specified in θ_(SB)) i.e. the reward states comes under the set:

$S_{R} = {\{{\left( {S_{\theta_{0^{\prime}}}S_{{\overset{.}{\theta}}_{- \frac{n_{\overset{.}{\theta}} - 1}{2}}}} \right),\left( {S_{\theta_{0}},S_{{\overset{.}{\theta}}_{- \frac{n_{\overset{.}{\theta}} + 1}{2}}}} \right),\ldots,\left( {S_{\theta_{0}},S_{{\overset{.}{\theta}}_{0}}} \right),\ldots,\left( {S_{\theta_{0}},{S_{\overset{.}{\theta}}}_{\frac{n_{\overset{.}{\theta}} - 3}{2}}} \right),\left( {S_{\theta_{0}},{S_{\overset{.}{\theta}}}_{\frac{n_{\overset{.}{\theta}} - 1}{2}}} \right)}\}}$

Reward policy-2: The RL agent gets a reward of +1 when ś falls inside the sub-state S_(θ) ₀ and S_({dot over (θ)}) ₀ simultaneously, in which case the reward state is only one state out of all the elements of the set S. So, S_(R)={(S_(θ) ₀ , S_({dot over (θ)}) ₀ )}. The significance of this reward policy is that it encodes the objective of making the inverted pendulum reach the vertical state and stay in the same state.

Reward policy-3: A Gaussian reward policy over the state space is represented as follows:

$r = e^{{- {(\frac{\theta}{\sigma_{\theta}})}^{2}} - {(\frac{\overset{.}{\theta}}{\sigma_{\theta}})}^{2}}$ where σ_(θ) and σ_({dot over (θ)}) are standard deviations of the Gaussian reward policy over the θ and {dot over (θ)} dimensions respectively. The lower values of σ_(θ) and σ_({dot over (θ)}) reduce the scope of getting the reward. Also, this is a continuous reward policy as opposed to the Reward policy-1 and the Reward policy-2.

Reward policy-4: A negative reward policy is also adapted frequently where the RL agent gets a negative reward of −10 or a punishment of 10 when the inverted pendulum falls out of θ_(max) and {dot over (θ)}_(max).

Transitional probability (P(ś|s,a)) and discount factor (γ): Considering the system dynamics are known, the need for finding the transitional probability is averted. The discount factor (γ) which is a parameter of temporal difference learning, is chosen to be 0.1.

In accordance with an embodiment, the parameters of the MDP model may include:

n_(θ) representing resolution of states in θ domain (from θ_(max) to −θ_(max));

n_({dot over (θ)}) representing resolution of states in {dot over (θ)} domain (from {dot over (θ)}_(max) to −{dot over (θ)}_(max));

n_(A) representing resolution of states in τ domain (from τ_(max) to −τ_(max));

τ_(max) representing the maximum torque exertable on the pivot joint or the boundary of the τ domain;

θ_(max) representing the boundary of the θ domain;

{dot over (θ)}_(max) representing a finite boundary of the {dot over (θ)} domain;

p_(a) representing property of a curve from which torque values in the τ domain is sampled;

p_(θ) representing property of a curve from which boundaries of states of the θ domain is sampled; and

p_({dot over (θ)}) representing property of a curve from which boundaries of the states of the {dot over (θ)} domain is sampled.

Increasing n_(θ), n_({dot over (θ)}), n_(A) parameter values gives more resolution to the system states (S) and actions (A), at the cost of increased training time for the RL agent.

In an embodiment, the Q-learning technique, which is an off-policy algorithm for temporal difference learning, is configured to learn to generate a torque representing an action for each state of the inverted pendulum by receiving proprioceptive inputs in the form of angle and angular velocity of the pivot joint of the inverted pendulum; and generating the torque at the pivot joint of the inverted pendulum based on the received proprioceptive inputs to maintain an upright posture of the inverted pendulum. In an embodiment of the present disclosure, the optimal state-action value (Q-value) function is in the form of a look-up table.

In accordance with the present disclosure, the algorithm for Q-learning technique may be represented as given below:

${{Initialize}{Q\left( {s,a} \right)}{arbitrarily}}{{Repeat}\left( {{for}{each}{episode}} \right):}{{Initialize}s}{{Repeat}\left( {{for}{each}{step}{of}{the}{episode}} \right):}{{Choose}a{from}s{using}a{policy}{derived}{from}Q}{\left( {{e.g.},{\varepsilon - {greedy}}} \right)}{{{Take}{action}a},{{observe}r},\overset{\prime}{s}}{\left. {Q\left( {s,a} \right)}\leftarrow{{\left( {1 - \alpha} \right){Q\left( {s,a} \right)}} + {\alpha\left( {r + {\gamma*\begin{matrix} \max \\ \overset{\prime}{s} \end{matrix}{Q\left( {\overset{\prime}{s},\overset{\prime}{a}} \right)}}} \right)}} \right.}{\left. s\leftarrow\overset{\prime}{s} \right.}{{{Until}s{is}{terminal}},}$ where α represents a learning rate, value of which typically lies between 0 and 1. α=0 means Q(s, α) is not updated or there is no learning, whereas α=1 means Q(s, α) is updated depending on an expected total reward achieved in a recent episode and is independent of a previous Q(s, α) value for a particular state and action. Also, ε represents an exploration factor associated with the ε-greedy policy that aims to define balance between exploration and exploitation of immediate reinforcement learning. A value of ε near 0 ensures the RL agent prefers exploitation over exploration and vice versa.

The s and a are the current state and immediate action taken at time t and ś and á are the state and the action taken at time t+1 respectively. It is proven that once the Q values are learnt for a significant number of episodes under the ε-greedy policy, the Q-learning algorithm converges to close approximation of optimal state-action value function (Q*(s, α)) with probability 1. Once the RL agent has learnt the Q values for significant number of episodes, the RL agent is ready for exploitation only (means ε=0), i.e. at a given state it will choose the action with maximum Q value by

$\arg\underset{a}{\max}{{Q\left( {s,\ a} \right)}.}$

In accordance with an embodiment of the present disclosure, the Center Of Pressure (COP) characteristics in the human body either along the sagittal or along the frontal plane are reproduced separately in accordance with a dynamical equation based on overall mass of the human body being concentrated at the point of Center Of Mass (COM) located at the 2^(nd) lumbar vertebrae of the human body, an approximate height of the 2^(nd) lumbar vertebrae, moment of inertia of the inverted pendulum, angle of the inverted pendulum with respect to the direction of gravity, gravitational acceleration, a stiffness constant and a damping constant, wherein the stiffness constant and the damping constant denote pivot joint properties of the inverted pendulum. FIG. 4 illustrates a 1 Degree Of Freedom (1 DOF) inverted pendulum representing a free body diagram of a simplified human body in the sagittal plane, in accordance with an embodiment of the present disclosure. In an embodiment, the dynamical equation may be represented as:

$\begin{matrix} {{{I\frac{d^{2}\theta}{{dt}^{2}}} = {{\tau + {{mgL}\sin\theta} - {B\theta} - {K\overset{.}{\theta}{where}I}} = {mL^{2}}}},} & (4) \end{matrix}$ and where m represents the overall mass of the human body being concentrated at the point of Center Of Mass (COM) located at the 2^(nd) lumbar vertebrae of the human body; L represents the approximate height of the 2^(nd) lumbar vertebrae; θ represents angle of the inverted pendulum; g represents gravitational acceleration; I represents moment of inertia of the inverted pendulum; K represents the stiffness constant; and B represents the damping constant.

In accordance with an embodiment of the present disclosure, the one or more processors 104 are configured to derive, at step 304, dynamical equations of a Spherical Inverted Pendulum (SIP) with respect to the global coordinate system. In accordance with the present disclosure, the SIP represents a second mechanical model of the human body that is capable of exhibiting postural sway along both the frontal plane and the sagittal plane together. In an embodiment, the dynamical equations are derived by using Lagrange's equations with two independent state variables (θx and θy) that represent angular deviation of the SIP about a pivot joint and along x and y axes respectively of the global coordinate system, wherein the pivot joint characterizes an ankle joint of the human body. FIG. 5 illustrates an SIP having 2 DOF with the two state variables, θx and θy, in accordance with an embodiment of the present disclosure. As the reference plan passes through the origin, there are two sets of equations for the SIP, a set representing above and below the reference planes respectively. In FIG. 5 , (l_(x), l_(y), l_(z)) is a Cartesian co-ordinate system representation of the position of the pendulum mass. m and l represent pendulum mass and length of the shaft of the SIP respectively.

In accordance with the present disclosure, the dynamical equations are derived as described hereinafter. The Lagrange's equations are as given below.

$\begin{matrix} {{{\frac{\partial}{\partial t}\left( \frac{\partial L}{\partial{\overset{.}{\theta}}_{x}} \right)} - \frac{\partial L}{\partial\theta_{x}}} = \tau_{x}} & (5) \end{matrix}$ $\begin{matrix} {{{\frac{\partial}{\partial t}\left( \frac{\partial L}{\partial{\overset{.}{\theta}}_{y}} \right)} - \frac{\partial L}{\partial\theta_{y}}} = \tau_{y}} &  \end{matrix}$ Where L is the Lagrangian, given by:

$\begin{matrix} {L = {K - V}} & (6) \end{matrix}$ $\begin{matrix} {{{Kinetic}{energy}} = {\frac{1}{2}{m\left( {v_{x}^{2} + v_{y}^{2} + v_{z}^{2}} \right)}}} & (7) \end{matrix}$ $\begin{matrix} {V = {{{Potential}{energy}} = {mgl}_{z}}} & (8) \end{matrix}$ $\begin{matrix} {{where},} &  \end{matrix}$ $\begin{matrix} {{v_{x} = \frac{\partial l_{x}}{\partial t}};{v_{y} = \frac{\partial l_{y}}{\partial t}};{v_{z} = \frac{\partial l_{z}}{\partial t}}} & (9) \end{matrix}$ $\begin{matrix} {{l_{x} = {l\sin\theta\cos\phi}};{l_{y} = {l\sin{\theta sin}\phi}};{l_{z} = {l\cos\theta}}} & (10) \end{matrix}$ Representing θ and ϕ in terms of θ_(x) and θ_(y):

$\begin{matrix} {{\tan\theta} = {{\frac{\sqrt{l_{x}^{2} + l_{y}^{2}}}{l_{z}}{and}\tan\phi} = {{\frac{l_{y}}{l_{z}}{where}\tan\theta_{y}} = {\frac{l_{x}}{l_{z}}{and}}}}} &  \end{matrix}$ $\begin{matrix} {{\tan\theta_{x}} = \frac{l_{y}}{l_{z}}} &  \end{matrix}$ $\begin{matrix} {{Therefore},} &  \end{matrix}$ $\begin{matrix} {{\tan\theta} = {\sqrt{\left( \frac{l_{x}}{l_{z}} \right)^{2} + \left( \frac{l_{y}}{l_{z}} \right)^{2}} = \sqrt{{\tan^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}}} & (11) \end{matrix}$ $\begin{matrix} {And} &  \end{matrix}$ $\begin{matrix} {{\tan\phi} = {\frac{l_{y}}{l_{z}} = {\frac{\frac{l_{y}}{l_{z}}}{\frac{l_{x}}{l_{z}}} = \frac{\tan\theta_{x}}{\tan\theta_{y}}}}} & (12) \end{matrix}$ From equations (11) and (12),

${{{\cos\theta} = \frac{1}{\sqrt{1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}}};{{\sin\theta} = \sqrt{\frac{{\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}{1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}}}}{{{\cos\phi} = \frac{\tan\theta_{y}}{{\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}};{{\sin\phi} = \frac{\tan\theta_{x}}{\sqrt{{\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}}}}$

Further, in accordance with the present disclosure, kinetic energy, potential energy and Lagrangian may be derived as below. Using the values of cos θ, sin θ, cos ϕ and sin ϕ in equation (10),

$\begin{matrix} {{l_{x} = \frac{l\tan\theta_{y}}{\sqrt{1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}}};} & (13) \end{matrix}$ $\begin{matrix} {{{l_{y} = \frac{l\tan\theta_{x}}{\sqrt{1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}}};}{l_{z} = {\pm \frac{l}{\sqrt{1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}}}}} &  \end{matrix}$ Plugging equation (13) in equation (9) gives the velocity components along the x, y, and z axes as follows:

$\begin{matrix} {v_{x} = {\frac{\partial l_{x}}{\partial t} = {l\frac{{\sec^{2}\theta_{x}\sec^{2}\theta_{y}{\overset{.}{\theta}}_{y}} - {\tan\theta_{x}\tan\theta_{y}\sec^{2}\theta_{x}{\overset{.}{\theta}}_{x}}}{\left( {1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}} \right)^{3/2}}}}} & (14) \end{matrix}$ $\begin{matrix} {{v_{y} = {\frac{\partial l_{y}}{\partial t} = {l\frac{{\sec^{2}\theta_{x}\sec^{2}\theta_{y}{\overset{.}{\theta}}_{x}} - {\tan\theta_{x}\tan\theta_{y}\sec^{2}\theta_{y}{\overset{.}{\theta}}_{y}}}{\left( {1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}} \right)^{3/2}}}}}{v_{z} = {\frac{\partial l_{z}}{\partial t} = {{- l}\frac{{\tan\theta_{x}\sec^{2}\theta_{x}{\overset{.}{\theta}}_{x}} + {\tan\theta_{y}\sec^{2}\theta_{y}{\overset{.}{\theta}}_{y}}}{\left( {1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}} \right)^{3/2}}}}}} &  \end{matrix}$ The kinetic energy and the potential energy derived from equations (7), (8), (13) and (14) may be represented as below.

$\begin{matrix} {K = {\frac{1}{2}{ml}^{2}\frac{\sec^{2}\theta_{x}\sec^{2}{\theta_{y}\left\lbrack {{\overset{.}{\theta}}_{x}^{2} + {\overset{.}{\theta}}_{y}^{2} + \left( {{\tan\theta_{x}{\overset{.}{\theta}}_{x}} - {\tan\theta_{y}{\overset{.}{\theta}}_{y}}} \right)^{2}} \right\rbrack}}{\left( {1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}} \right)^{2}}}} & (15) \end{matrix}$ $\begin{matrix} {{{V = {\pm \frac{mgl}{\sqrt{1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}}}}};{{{+ {ve}}{for}\theta} \leq {90{^\circ}{and}}}}\text{}{{{- {ve}}{for}\theta} \geq 90^{{^\circ}}}} &  \end{matrix}$

In accordance with an embodiment of the present disclosure, using the kinetic energy and potential energy in equation (6), provides the Lagrangian (L) and plugging the Lagrangian in equation (5), the dynamical equations of the SIP are represented as:

for θ=0 to 90°, θ being the polar or zenith angle of the spherical coordinate system,

$\begin{matrix} {{\overset{¨}{\theta}}_{x} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{x}} - {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}\tan^{2}\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{x}^{2}\tan\theta_{x}\tan^{2}{\theta_{y}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack + {\frac{g}{l}\frac{\tan\theta_{x}}{\sec^{2}\theta_{x}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{y}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{y}}} \right)}},}}}} & (16) \end{matrix}$ $\begin{matrix} {{\overset{¨}{\theta}}_{y} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{y}} - {\cos 2\theta_{x}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}\tan^{2}\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{y}^{2}\tan\theta_{y}\tan^{2}{\theta_{x}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack + {\frac{g}{l}\frac{\tan\theta_{y}}{\sec^{2}\theta_{y}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{x}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{x}}} \right)}};}}}} & (17) \end{matrix}$ $\begin{matrix} {{{{for}\theta} > {90{^\circ}}},} & (18) \end{matrix}$ $\begin{matrix} {{\overset{¨}{\theta}}_{x} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{x}} - {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}\tan^{2}\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{x}^{2}\tan\theta_{x}\tan^{2}{\theta_{y}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack - {\frac{g}{l}\frac{\tan\theta_{x}}{\sec^{2}\theta_{x}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{y}\tan\theta_{x}\tan\theta_{y}} + {\tau_{x}\sec^{2}\theta_{y}}} \right){and}}}}}} &  \end{matrix}$ $\begin{matrix} {{\overset{¨}{\theta}}_{y} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{y}} - {\cos 2\theta_{x}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}\tan^{2}\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{y}^{2}\tan\theta_{y}\tan^{2}{\theta_{x}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack - {\frac{g}{l}\frac{\tan\theta_{y}}{\sec^{2}\theta_{y}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{x}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{x}}} \right)}}}}} & (19) \end{matrix}$ where {umlaut over (θ)}_(x) and {umlaut over (θ)}_(y), represent the angular acceleration with reference to x and y axes respectively.

The dynamical equations of the SIP were simulated for the following initial condition:

(θ_(x)=2.3562 rad, {dot over (θ)}_(x)=2.01 rad/sec, θ_(y)=2.3562 rad, {dot over (θ)}_(y)=2.01 {dot over (r)}ad/sec), wherein given a {dot over (θ)}_(y), the {dot over (θ)}_(x) is calculated by making v_(z)=0 in the equation

${v_{z} = {{- l}\frac{{\tan\theta_{x}\sec^{2}\theta_{x}{\overset{.}{\theta}}_{x}} + {\tan\theta_{y}\sec^{2}\theta_{y}{\overset{.}{\theta}}_{y}}}{\left( {1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}} \right)^{3/2}}}},$ and wherein v_(z) represents velocity along the z axes, θ_(x) and θ_(y) being angles of the SIP with respect to the x axis and the y axis respectively and {dot over (θ)}_(x) and {dot over (θ)}_(y) being angular velocity of the SIP respectively. Again the dynamical equations of the SIP were also simulated with changed initial angular velocity of 1.2 rad/sec, the initial condition being: (θ_(x)=2.3562 rad, {dot over (θ)}_(x)=1.2 rad/sec, θ_(y)=−2.3562 rad, {dot over (θ)}_(y)=1.2 {dot over (r)}ad/sec), to find out {dot over (θ)}_(x) given {dot over (θ)}_(y), {dot over (θ)}_(x) is calculated by making v_(z)=0 in the following equation

$v_{z} = {{- l}{\frac{{\tan\theta_{x}\sec^{2}\theta_{x}{\overset{.}{\theta}}_{x}} + {\tan\theta_{y}{\sec}^{2}\theta_{y}{\overset{.}{\theta}}_{y}}}{\left( {1 + {\tan^{2}\theta_{x}} + {\tan^{2}\theta_{y}}} \right)^{3/2}}.}}$ The SIP equations were also checked for above the ground scenario by simulating the dynamical equations for the following initial condition: (θ_(x)=0.0524 rad, {dot over (θ)}_(x)=0 rad/sec, θ_(y)=−0.0354 rad, {dot over (θ)}_(y)=0 r{dot over (a)}d/sec).

In accordance with an embodiment of the present disclosure, the one or more processors 104 are configured to model, at step 306, the human postural sway both along the sagittal plane and along the frontal plane together using the modeled neuronal controller and the derived dynamical equations of the SIP by tuning (i) a reward function comprised in the modeled neuronal controller and (ii) a set of parameters to balance the SIP such that the postural sway characteristics generated by the modeled neuronal controller match the postural sway characteristics of one or more control subjects.

To validate model generated sway characteristics of the present disclosure, experimental data pertaining to the human postural sway, in the form of the COP characteristics of four control subjects in the age group 25 to 40 years having height in the range 5′3″ to 5′10″, and body weight in the range 50 to 70 kg, when standing bipedally for approximately 30 secs were collected. It was noted from the frequency spectrum of the COP-x signal where |X(k)| is Fourier coefficient corresponds to the k^(th) frequency spectrum, that the frequency spectrum is almost uniformly distributed from 0 to 1 Hz except few peaks at very low frequency component, comparable to DC component. The stated experimental data is also referred later in the description with reference to the description for FIG. 6A through FIG. 6D, FIG. 7A through FIG. 7D, FIG. 8A through FIG. 8D and FIG. 9A through FIG. 9D. To match the human postural sway seen in the control subjects, in accordance with the present disclosure, three additional parameters are introduced in the model. In accordance with the present disclosure, the set of parameters include parameters of the MDP model and the three additional parameters associated with physiology of the human body.

In accordance with an embodiment, the parameters associated with physiology of the human body may include White Gaussian Noise (WGN) added to the generated torque to represent a real world noisy biological system; filtering factor (λ) added to portray signaling characteristics of a neuromuscular junction; and Scaling Factor (SF) introduced to scale down the magnitude of the generated torque for each state of the inverted pendulum after the training of the RL agent is completed, wherein completion of the training is represented by a balanced SIP.

In accordance with an embodiment of the present disclosure, a relationship between the generated torque by the modeled neuronal controller and the torque applied to the dynamical system based on the parameters associated with physiology of the human body is represented as:

${{\tau(t)} = {{\lambda{\tau_{1}(t)}} + {\left( {1 - \lambda} \right){\tau\left( {t - 1} \right)}{and}}}}{{{\tau_{SF}(t)} = \frac{\tau(t)}{SF}},}$ wherein τ₁=τ_(NC)+c×WGN, τ(t) represents the torque applied to the dynamical system, τ_(SF)(t) represents the torque after introducing the Scaling Factor, τ_(NC) represents the generated torque by the modeled neuronal controller, WGN represents the White Gaussian Noise of magnitude 1, and c represents noise amplitude.

Experimental Results

The model sway characteristics were modeled without incorporating the filtering factor, the scaling factor and the noise. The closest human postural sway characteristics generated by the model of the present disclosure are illustrated in FIG. 6A through FIG. 6D and FIG. 7A through FIG. 7D. Particularly, FIG. 6A and FIG. 6B illustrate the COP characteristics along the x axis and a corresponding frequency spectrum while FIG. 6C and FIG. 6D illustrate the COP characteristics along the y axis and a corresponding frequency spectrum for the following set of parameters:

${{n_{\theta_{x}} = 15},{{n_{\theta_{y}} = 15};}}{{n_{{\overset{.}{\theta}}_{x}} = 7},{n_{{\overset{.}{\theta}}_{y}} = 7},{n_{\tau_{x}} = 9},{n_{\tau_{y}} = 9}}{{\tau_{x_{\max}} = {500{Nm}}},{\tau_{y_{\max}} = {500{Nm}}},{\theta_{x_{\max}} = {\theta_{y_{\max}} = {\frac{\pi}{8}{rad}}}},{{\overset{.}{\theta}}_{x_{\max}} = {{\overset{.}{\theta}}_{y_{\max}} = {0.3{rad}/s}}},{p_{\theta_{x}} = 3},{p_{\theta_{y}} = 3},{p_{{\overset{.}{\theta}}_{x}} = 2},{p_{{\overset{.}{\theta}}_{y}} = 2},{p_{\tau_{x}} = 3},{p_{\tau_{y}} = 3},{\sigma_{\theta_{x}} = {0.01{rad}}},{\sigma_{\theta_{y}} = {0.01{rad}}},{\sigma_{{\overset{.}{\theta}}_{x}} = 0.045},{\sigma_{{\overset{.}{\theta}}_{y}} = 0.045},{{SF}_{\tau_{x}} = 3},{{SF}_{\tau_{y}} = 3.}}$ It may be noted that the frequency spectrum of COP-x is closely matched with the experimental data (detailed above) but the frequency spectrum is twice as spread as the experimental data for COP-y.

FIG. 7A and FIG. 7B illustrate variance of the angle of the SIP and the angular velocity of the SIP with respect to the x axis and the y axis respectively (θ_(x), θ_(y), {dot over (θ)}_(x), {dot over (θ)}_(y)) w.r.t time. FIG. 7C illustrates a plot of applied torques τ_(x), τ_(y) at the pivot joint by the neuronal controller of the present disclosure. It may be noted that the maximum torque is approximately 166.67 Nm. FIG. 7D illustrates COP movement along the x axis and the y axis with parameter values stated with reference to FIG. 6A through FIG. 6D.

The additional parameters viz., the filtering factor (λ_(τ) _(y) , λ_(τ) _(x) ), the noise amplitude (c_(τ) _(x) , c_(τ) _(y) ) and the scaling factor (SF_(τ) _(y) , SF_(τ) _(x) ) were then introduced which resulted in the human postural sway characteristics as seen in the experimental data. After tuning the parameters of the MDP model, the additional parameters and choosing the Reward policy-3 with appropriate standard deviation, the model of the present disclosure produced an expected sway as depicted in FIG. 8A through FIG. 8D and FIG. 9A through FIG. 9D. Particularly, FIGS. 8A and 8B illustrate the COP characteristics along the x axis and a corresponding frequency spectrum while FIG. 8C and FIG. 8D illustrate the COP characteristics along the y axis and a corresponding frequency spectrum. It may be noted that the frequency spectrum is spread between 0 to 1 Hz as observed in the experimental data. FIG. 9A and FIG. 9B illustrate variance of the angle of the SIP and the angular velocity of the SIP with respect to the x axis and the y axis respectively (θ_(x), θ_(y), {dot over (θ)}_(x), {dot over (θ)}_(y)) w.r.t time. FIG. 9C illustrates a plot of applied torques τ_(x), τ_(y) at the pivot joint by the neuronal controller of the present disclosure. FIG. 9D illustrates COP movement along the x axis and the y axis with parameter values used with reference to FIG. 8A through FIG. 8D. The illustrated sway characteristics were observed using the following set of parameters:

${{n_{\theta_{x}} = 15},{{n_{\theta_{y}} = 15};}}{{n_{{\overset{.}{\theta}}_{x}} = 7},{n_{{\overset{.}{\theta}}_{y}} = 7},{n_{\tau_{x}} = 9},{n_{\tau_{y}} = 9}}{{\tau_{x_{\max}} = {500{Nm}}},{\tau_{y_{\max}} = {500{Nm}}},{\theta_{x_{\max}} = {\theta_{y_{\max}} = {\frac{\pi}{8}{rad}}}},{{\overset{.}{\theta}}_{x_{\max}} = {{\overset{.}{\theta}}_{y_{\max}} = {0.3{rad}/s}}},{p_{\theta_{x}} = 3},{p_{\theta_{y}} = 3},{p_{{\overset{.}{\theta}}_{x}} = 2},{p_{{\overset{.}{\theta}}_{y}} = 2},{p_{\tau_{y}} = 3},{p_{\tau_{y}} = 3},{\sigma_{\theta_{x}} = {0.01{rad}}},{\sigma_{\theta_{y}} = {0.01{rad}}},{\sigma_{{\overset{.}{\theta}}_{x}} = 0.045},{\sigma_{{\overset{.}{\theta}}_{y}} = 0.045},{\lambda_{\tau_{x}} = 0.8},{\lambda_{\tau_{y}} = 0.8},{{SF}_{\tau_{x}} = 3},{{SF}_{\tau_{y}} = 3},{c_{\tau_{x}} = {7.75{Nm}}},{c_{\tau_{y}} = {7.75{{Nm}.}}}}$

In accordance with the present disclosure, by obtaining the COP characteristics of a test subject, using say, a pressure sensing board, a frequency spectrum representative of the postural sway of the test subject may be derived. Accordingly, using the neuronal controller model described above a best set of parameters that correspond to the postural sway of the test subject can be derived. Deviation of the derived set of parameters from those of the one or more control subjects provides insights into balance impairments, if any, associated with the test subject. Thus the mathematical concepts disclosed in the description of the method of the present disclosure are integrated into a practical application of modeling a neuronal controller that exhibits human postural sway so that an analyses of the postural balance of the test subject can be performed and subject specific rehabilitation protocols may be prescribed.

The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.

It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.

The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.

Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.

It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims. 

What is claimed is:
 1. A processor implemented method for modeling a neuronal controller exhibiting human postural sway, the method comprising the steps of: modeling, by one or more hardware processors, the neuronal controller in the form of a Reinforcement Learning (RL) agent based on an inverted pendulum with 1 Degree Of Freedom (1 DOF) representing a first mechanical model of a human body in the form of a dynamical system, wherein the RL agent is trained using a Q-learning technique to learn an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model, wherein the modeled neuronal controller is configured to reproduce Center Of Pressure (COP) characteristics in the human body either along a sagittal plane or along a frontal plane separately; deriving, by the one or more hardware processors, dynamical equations of a Spherical Inverted Pendulum (SIP) with respect to a global coordinate system, wherein the SIP represents a second mechanical model of the human body that exhibits postural sway along both the frontal plane and the sagittal plane together, wherein the dynamical equations are derived by using Lagrange's equations with two independent state variables (θx and θy) being angular deviation of the SIP about a pivot joint and along x and y axes respectively of the global coordinate system, wherein the pivot joint characterizes an ankle joint of the human body; and modeling, by the one or more hardware processors, the human postural sway both along the sagittal plane and along the frontal plane together using the modeled neuronal controller and the derived dynamical equations of the SIP by tuning (i) a reward function comprised in the modeled neuronal controller and (ii) a set of parameters to balance the SIP such that the postural sway characteristics generated by the modeled neuronal controller match the postural sway characteristics of one or more control subjects, wherein the set of parameters include parameters of the MDP model and parameters associated with physiology of the human body.
 2. The processor implemented method of claim 1, wherein the Q-learning technique is configured to learn to generate a torque representing an action for each state of the inverted pendulum by: receiving proprioceptive inputs in the form of angle and angular velocity of the pivot joint of the inverted pendulum; and generating the torque at the pivot joint of the inverted pendulum based on the received proprioceptive inputs to maintain an upright posture of the inverted pendulum.
 3. The processor implemented method of claim 1, wherein the step of modeling a neuronal controller comprises reproducing the Center Of Pressure (COP) characteristics in the human body either along the sagittal or along the frontal plane separately in accordance with a dynamical equation based on overall mass of the human body being concentrated at the point of Center Of Mass (COM) located at the 2^(nd) lumbar vertebrae of the human body, an approximate height of the 2^(nd) lumbar vertebrae, moment of inertia of the inverted pendulum, angle of the inverted pendulum with respect to the direction of gravity, gravitational acceleration, a stiffness constant and a damping constant, wherein the stiffness constant and the damping constant denote pivot joint properties of the inverted pendulum.
 4. The processor implemented method of claim 3, wherein the step of modeling a neuronal controller comprises reproducing the Center Of Pressure (COP) characteristics in the human body either along the sagittal or along the frontal plane separately in accordance with the dynamical equation represented as: ${{I\frac{d^{2}\theta}{{dt}^{2}}} = {\tau + {{mgL}\sin\theta} - {B\theta} - {K\overset{.}{\theta}}}},$ wherein I=mL², and wherein m represents overall mass of the human body being concentrated at the point of Center Of Mass (COM) located at the 2^(nd) lumbar vertebrae of the human body; L represents the approximate height of the 2^(nd) lumbar vertebrae; θ represents angle of the inverted pendulum with respect to the direction of gravity; g represents gravitational acceleration; I represents moment of inertia of the inverted pendulum; K represents the stiffness constant; and B represents the damping constant.
 5. The processor implemented method of claim 2, wherein the parameters of the MDP model include: n_(θ) representing resolution of states in θ domain (from θ_(max) to −θ_(max)); n_({dot over (θ)}) representing resolution of states in {dot over (θ)} domain (from {dot over (θ)}_(max) to −{dot over (θ)}_(max)); n_(A) representing resolution of states in τ domain (from τ_(max) to −τ_(max)); τ_(max) representing the maximum torque exertable on the pivot joint or the boundary of the τ domain; θ_(max) representing the boundary of the θ domain; {dot over (θ)}_(max) representing a finite boundary of the {dot over (θ)} domain; p_(α) representing property of a curve from which torque values in the τ domain is sampled; p_(θ) representing property of a curve from which boundaries of states of the θ domain is sampled; and p_({dot over (θ)}) representing property of a curve from which boundaries of the states of the {dot over (θ)} domain is sampled.
 6. The processor implemented method of claim 2, wherein the parameters associated with physiology of the human body include: White Gaussian Noise (WGN) added to the generated torque to represent a real world noisy biological system; filtering factor (λ) added to portray signaling characteristics of a neuromuscular junction; and Scaling Factor (SF) introduced to scale down the magnitude of the generated torque for each state of the inverted pendulum after the training of the RL agent is completed, wherein completion of the training is represented by a balanced SIP.
 7. The processor implemented method of claim 6, wherein a relationship between the generated torque by the modeled neuronal controller and the torque applied to the dynamical system based on the parameters associated with physiology of the human body is represented as: ${{\tau(t)} = {{\lambda{\tau_{1}(t)}} + {\left( {1 - \lambda} \right){\tau\left( {t - 1} \right)}{and}}}}{{{\tau_{SF}(t)} = \frac{\tau(t)}{SF}},}$ wherein τ₁=τ_(NC)+c×WGN, τ(t) represents the torque applied to the dynamical system, τ_(SF)(t) represents the torque after introducing the Scaling Factor, τ_(NC) represents the generated torque by the modeled neuronal controller, WGN represents the White Gaussian Noise of magnitude 1, and c represents noise amplitude.
 8. The processor implemented method of claim 7, wherein the dynamical equations of the SIP are represented as: for θ=0 to 90°, θ being the polar or zenith angle of the spherical coordinate system, $\begin{matrix} {{\overset{¨}{\theta}}_{x} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{x}} - {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}\tan^{2}\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{x}^{2}\tan\theta_{x}\tan^{2}{\theta_{y}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack + {\frac{g}{l}\frac{\tan\theta_{x}}{\sec^{2}\theta_{x}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{y}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{y}}} \right)}},{{{and}{\overset{¨}{\theta}}_{y}} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{y}} - {\cos 2\theta_{x}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}\tan^{2}\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{y}^{2}\tan\theta_{y}\tan^{2}{\theta_{x}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack + {\frac{g}{l}\frac{\tan\theta_{y}}{\sec^{2}\theta_{y}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{x}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{x}}} \right)}};{{{and}{for}\theta} > {90{^\circ}}}},}}}}}}}} &  \end{matrix}$ ${\overset{¨}{\theta}}_{x} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{x}} - {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}\tan^{2}\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{x}^{2}\tan\theta_{x}\tan^{2}{\theta_{y}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack - {\frac{g}{l}\frac{\tan\theta_{x}}{\sec^{2}\theta_{x}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{y}\tan\theta_{x}\tan\theta_{y}} + {\tau_{x}\sec^{2}\theta_{y}}} \right){and}{\overset{¨}{\theta}}_{y}}} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{y}} - {\cos 2\theta_{x}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}\tan^{2}\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{y}^{2}\tan\theta_{y}\tan^{2}{\theta_{x}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack - {\frac{g}{l}\frac{\tan\theta_{y}}{\sec^{2}\theta_{y}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{x}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{x}}} \right)}}}}}}}$
 9. A system for modeling a neuronal controller exhibiting human postural sway, the system comprising: one or more data storage devices operatively coupled to one or more hardware processors and configured to store instructions configured for execution by the one or more hardware processors to: model the neuronal controller in the form of a Reinforcement Learning (RL) agent based on an inverted pendulum with 1 Degree Of Freedom (1 DOF) representing a first mechanical model of a human body in the form of a dynamical system, wherein the RL agent is trained using a Q-learning technique to learn an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model, wherein the modeled neuronal controller is configured to reproduce Center Of Pressure (COP) characteristics in the human body either along a sagittal plane or along a frontal plane separately; derive dynamical equations of a Spherical Inverted Pendulum (SIP) with respect to a global coordinate system, wherein the SIP represents a second mechanical model of the human body that exhibits postural sway along both the frontal plane and the sagittal plane together, wherein the dynamical equations are derived by using Lagrange's equations with two independent state variables (θx and θy) being angular deviation of the SIP about a pivot joint and along x and y axes respectively of the global coordinate system, wherein the pivot joint characterizes an ankle joint of the human body; and model the human postural sway both along the sagittal plane and along the frontal plane together using the modeled neuronal controller and the derived dynamical equations of the SIP by tuning (i) a reward function comprised in the modeled neuronal controller and (ii) a set of parameters to balance the SIP such that the postural sway characteristics generated by the modeled neuronal controller match the postural sway characteristics of one or more control subjects, wherein the set of parameters include parameters of the MDP model and parameters associated with physiology of the human body.
 10. The system of claim 9, wherein the Q-learning technique is configured to learn to generate a torque representing an action for each state of the inverted pendulum by: receiving proprioceptive inputs in the form of angle and angular velocity of the pivot joint of the inverted pendulum; and generating the torque at the pivot joint of the inverted pendulum based on the received proprioceptive inputs to maintain an upright posture of the inverted pendulum.
 11. The system of claim 9, wherein the one or more hardware processors are configured to perform the step of modeling a neuronal controller by reproducing the Center Of Pressure (COP) characteristics in the human body either along the sagittal or along the frontal plane separately in accordance with a dynamic equation based on overall mass m of the human body being concentrated at the point of Center Of Mass (COM) located at the 2^(nd) lumbar vertebrae of the human body, an approximate height L of the 2^(nd) lumbar vertebrae, moment of inertia of the inverted pendulum I, angle of the inverted pendulum θ with respect to the direction of gravity, gravitational acceleration g, a stiffness constant K and a damping constant B, wherein the stiffness constant and the damping constant denote pivot joint properties of the inverted pendulum.
 12. The system of claim 11, wherein the parameters of the MDP model include: n_(θ) representing resolution of states in θ domain (from θ_(max) to −θ_(max)); n_({dot over (θ)}) representing resolution of states in {dot over (θ)} domain (from {dot over (θ)}_(max) to −{dot over (θ)}_(max)); n_(A) representing resolution of states in τ domain (from τ_(max) to −τ_(max)); τ_(max) representing the maximum torque exertable on the pivot joint or the boundary of the τ domain; θ_(max) representing the boundary of the θ domain; {dot over (θ)}_(max) representing a finite boundary of the {dot over (θ)} domain; p_(α) representing property of a curve from which torque values in the τ domain is sampled; p_(θ) representing property of a curve from which boundaries of states of the θ domain is sampled; and p_({dot over (θ)}) representing property of a curve from which boundaries of the states of the {dot over (θ)} domain is sampled; and the parameters associated with physiology of the human body include: White Gaussian Noise (WGN) added to the generated torque to represent a real world noisy biological system; filtering factor (λ) added to portray signaling characteristics of a neuromuscular junction; and Scaling Factor (SF) introduced to scale down the magnitude of the generated torque for each state of the inverted pendulum after the training of the RL agent is completed, wherein completion is of the training is represented by a balanced SIP.
 13. The system of claim 12, wherein a relationship between the generated torque by the modeled neuronal controller and the torque applied to the dynamical system based on the parameters associated with physiology of the human body is represented as: ${{\tau(t)} = {{\lambda{\tau_{1}(t)}} + {\left( {1 - \lambda} \right){\tau\left( {t - 1} \right)}}}},{{\tau_{SF}(t)} = \frac{\tau(t)}{SF}}$ wherein τ₁=τ_(NC)+c×WGN, τ(t) represents the torque applied to the dynamical system, τ_(SF)(t) represents the torque after introducing the Scaling Factor, τ_(NC) represents the generated torque by the modeled neuronal controller, WGN represents the White Gaussian Noise of magnitude 1, and c represents noise amplitude.
 14. The system of claim 13, wherein the dynamical equations of the SIP are represented as: for θ=0 to 90°, θ being the polar or zenith angle of the spherical coordinate system, $\begin{matrix} {{\overset{¨}{\theta}}_{x} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{x}} - {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}\tan^{2}\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{x}^{2}\tan\theta_{x}\tan^{2}{\theta_{y}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack + {\frac{g}{l}\frac{\tan\theta_{x}}{\sec^{2}\theta_{x}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{y}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{y}}} \right)}},{{{and}{\overset{¨}{\theta}}_{y}} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{y}} - {\cos 2\theta_{x}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}\tan^{2}\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{y}^{2}\tan\theta_{y}\tan^{2}{\theta_{x}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack + {\frac{g}{l}\frac{\tan\theta_{y}}{\sec^{2}\theta_{y}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + \frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}};{{{and}{for}\theta} > {90{^\circ}}}},}}}}}}}} &  \end{matrix}$ ${\overset{¨}{\theta}}_{x} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{x}} - {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{x}\sec^{4}\theta_{y}\tan\theta_{y}\tan^{2}\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{x}^{2}\tan\theta_{x}\tan^{2}{\theta_{y}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack - {\frac{g}{l}\frac{\tan\theta_{x}}{\sec^{2}\theta_{x}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}\left( {{\tau_{y}\tan\theta_{x}\tan\theta_{y}} + {\tau_{x}\sec^{2}\theta_{y}}} \right){and}{\overset{¨}{\theta}}_{y}}} = {\frac{1}{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)^{2}}{{\left\lbrack {{\frac{1}{2}\left( {1 + {3\cos 2\theta_{y}} - {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} + {\left( {1 + {\cos 2\theta_{x}} + {\cos 2\theta_{y}} + {\cos 2\theta_{x}\cos 2\theta_{y}}} \right)\sec^{2}\theta_{y}\sec^{4}\theta_{x}\tan\theta_{x}\tan^{2}\theta_{y}{\overset{.}{\theta}}_{x}{\overset{.}{\theta}}_{y}} - {2{\overset{.}{\theta}}_{y}^{2}\tan\theta_{y}\tan^{2}{\theta_{x}\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}}} \right\rbrack - {\frac{g}{l}\frac{\tan\theta_{y}}{\sec^{2}\theta_{y}}\sqrt{{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}}} + {\frac{\left( {{\sec^{2}\theta_{y}} + {\tan^{2}\theta_{x}}} \right)}{{ml}^{2}\sec^{2}\theta_{x}\sec^{2}\theta_{y}}{\left( {{\tau_{x}\tan\theta_{x}\tan\theta_{y}} + {\tau_{y}\sec^{2}\theta_{x}}} \right).}}}}}}}}$
 15. A computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: model the neuronal controller in the form of a Reinforcement Learning (RL) agent based on an inverted pendulum with 1 Degree Of Freedom (1 DOF) representing a first mechanical model of a human body in the form of a dynamical system, wherein the RL agent is trained using a Q-learning technique to learn an optimal state-action value (Q-value) function for a tuneable Markov Decision Process (MDP) model, wherein the modeled neuronal controller is configured to reproduce Center Of Pressure (COP) characteristics in the human body either along a sagittal plane or along a frontal plane separately; derive dynamical equations of a Spherical Inverted Pendulum (SIP) with respect to a global coordinate system, wherein the SIP represents a second mechanical model of the human body that exhibits postural sway along both the frontal plane and the sagittal plane together, wherein the dynamical equations are derived by using Lagrange's equations with two independent state variables (θx and θy) being angular deviation of the SIP about a pivot joint and along x and y axes respectively of the global coordinate system, wherein the pivot joint characterizes an ankle joint of the human body; and model the human postural sway both along the sagittal plane and along the frontal plane together using the modeled neuronal controller and the derived dynamical equations of the SIP by tuning (i) a reward function comprised in the modeled neuronal controller and (ii) a set of parameters to balance the SIP such that the postural sway characteristics generated by the modeled neuronal controller match the postural sway characteristics of one or more control subjects, wherein the set of parameters include parameters of the MDP model and parameters associated with physiology of the human body. 